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Abstract 

Motivated by the growing interest in today’s massive parallel computing capabilities we 
analyze a queueing network with many servers in parallel to which jobs arrive a according 
to a Poisson process. Each job, upon arrival, is split into several pieces which are randomly 
routed to specific servers in the network, without centralized information about the status of the 
servers’ individual queues. The main feature of this system is that the different pieces of a job 
must initiate their service in a synchronized fashion. Moreover, the system operates in a FCFS 
basis. The synchronization and service discipline create blocking and idleness among the servers, 
which is compensated by the fast service time attained through the parallelization of the work. 
We analyze the stationary waiting time distribution of jobs under a many servers limit and 
provide exact tail asymptotics; these asymptotics generalize the celebrated Cramer-Lundberg 
approximation for the single-server queue. 

Kewywords: Queueing networks with synchronization, many servers queue, Cramer-Lundberg 
approximation, high-order Lindley equation, cloud computing, MapReduce, weighted branching 
processes, stochastic fixed-point equations, large deviations. 
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1 Introduction 

Cloud computing is an emerging paradigm for accessing shared computing and storage resources 
over the internet. “Clouds” consist of hundreds of thousands of servers that provide scalable, on- 
dernand data storage and processing capacity to end users. Operators of these large facilities achieve 
economies of scale and can make efficient use of their computing infrastructure by optimizing how 
tasks are processed across an interconnected and distributed computer network. End users of cloud 
computing benefit from reduced capital expenditures and from the flexibility that the scalable 
paradigm offers. Worldwide demand for cloud computing has experienced rapid growth that is 
expected to continue into the foreseeable future. 

Motivated by this rapidly emerging phenomenon, we analyze a queueing model for a large network 
of parallel servers. Throughout the paper we use the generic term server to represent a computing 
unit, e.g., a computer or a processor in the network. Jobs arrive to the network at random times 
and are split at the time of arrival into a number of pieces. These pieces are then immediately 
assigned to randomly selected servers, where they join the corresponding queues. The main reason 
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for assigning these job fragments to specific servers is that maintaining a single centralized queue 
is not scalable in systems of this size, and keeping information about the individual status of 
each queue to make informed routing decisions can be costly. In systems where data locality is 
important, or where certain tasks need to be done at specific servers, the random routing can be 
used to approximately model storage, or specific resources, that are randomly spread out through 
the network. The service requirements of each of the pieces of a job are allowed to be random 
and possibly dependent. The main distinctive features of this model are: 1) all the pieces of a job 
must begin their service at the same time, i.e., in a synchronized fashion; 2) jobs are processed 
in a first-come-first-serve (FCFS) basis, i.e., each of the individual queues at the servers follow a 
FCFS service discipline. We refer to these two characteristics as the synchronization and fairness 
requirements. Figure [l] depicts our model. 

The fairness requirement is common in many queueing systems where jobs originate from different 
users, while the job synchronization is a distinctive characteristic of this model that allows us 
to incorporate the need to exchange information among the different pieces of a job during their 
processing. This is certainly the case for many scientific applications that involve simulations of 
complex systems, including: wireless networks, neuronal networks, bio-molecular and biological 
systems. Alternatively, our model also provides an approximation for systems where job fragments 
need to be joined after being processed, which will be discussed in detail in Section 0 We point 
out that the synchronization of the different pieces of a job can in principle be attained without 
the need of centralized information, since the specific server assignment occurs upon arrival, and 
each piece would only need to keep track of its “sibling” fragments, e.g., once a fragment is ready 
to initiate its service it can notify its siblings, and the last one to do so determines when the job 
can start processing. 

The fairness and synchronization requirements create, nonetheless, blocking and idleness that are 
not present in other distributed systems, e.g., multi server queues where the different fragments of a 
job can be processed independently, and can therefore be thought of as batch arrivals. This lack of 
efficiency is compensated by the service speed attained through the parallel processing, which can 
be considerable for very large jobs. To illustrate this gain in processing speed we have compared 
in Section [4] (Table [T]) the sojourn times of jobs in our model and in a comparable multiserver 
queue where jobs are not split into pieces. A detailed discussion about this comparison can be 
found in Section [4j but for now it suffices to mention that our model dramatically outperforms the 
non-distributed system in spite of the blocking and suboptimal routing. 

We analyze in this paper the stationary waiting time of jobs (excluding service) in an asymptotic 
regime where the arrival rate of jobs and the number of servers grow to infinity, but the sizes of 
jobs and the service requirements of the individual fragments remain constant. For simplicity, we 
will refer to this type of limit as a “many server asymptotic regime”, not to be confused with the 
Halfin-Whitt regime used in multiserver queueing systems m)- In particular, after establishing 
sufficient conditions for the stability of the finite system, we show that the limiting stationary 
waiting time W is given by the endogenous solution to the following high-order Lindley equation: 

W = ( max (Wi + Xi-Ti)) , (1.1) 

where the {IFijigpj are i.i.d. copies of W, independent of (N, xi, H, X 2 , 72,...), N is the number 
of pieces of a job (allowed to be random), r* is the limiting inter arrival time between piece i and 
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Figure 1: Queueing model for a server cloud where jobs need to be synchronized. In this figure, the 
yellow, purple and orange jobs are being processed, while the pink and brown jobs have completed 
their service; two of the three brown pieces were processed at servers not depicted in the diagram. 
Note that the last server at the bottom will remain idle until the yellow and purple jobs complete 
their service, and the first server at the top will need to wait for both the orange and light blue 
jobs to be done before starting to process the red job. The blue job in queue at the third server 
from the top has only one piece and can begin its processing as soon as the yellow job leaves. 


the job immediately in front of it at its assigned queue, and Xi is the service time of the fragment 
of the job in front of the zth piece; = stands for equality in distribution and x + = max{0, x}. Note 
that for N = 1, (1.1) reduces to the classical Lindley equation, satisfied by the GI/GI/1 queue. 
Recursion (1.1) was termed “high-order Lindley equation” and studied in the context of queues 
with synchronization in [22], although only for deterministic N. 


Moreover, by applying the main result in }20] for the maximum of the branching random walk, 
we provide the exact asymptotics for the tail distribution of W. To explain the significance of 
this result it is worth considering first a single-server queue with renewal arrivals and i.i.d. service 
requirements, for which it is well known that the stationary waiting time distribution, , 

satisfies 


P 


> x ) = P 


max Sk > x 
k> l 


where Sk = X\ + • • • + Xk is a random walk with i.i.d. increments satisfying E[X i] < 0. Using the 
ladder heights of {Sk} and renewal theory yields the celebrated Cramer-Lundberg approximation 
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where 0 < -Rgi/GI/i < oo is a constant that can be written in terms of the limiting excess of the 
renewal process defined by the ladder heights, and 9 > 0, known as the Cramer-Lundberg root, 
solves E[e eXl ] = 1 and satisfies 0 < E[X\e eXl ] < oo (see [7j, Chapter XIII, for more details). 
Throughout the paper, f(x) ~ g(x) as x —> oo stands for lim I ,_ ) . 0O f(x)/g(x) = 1. 
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Back to the asymptotic behavior of W in this paper, Theorem 3.4 in |2Q] states that for 9 > 0 
satisfying the conditions 
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we have that 

P(W > x) ~ He~ Gx (1.2) 

as x —> oo for some constant 0 < H < oo. In other words, the queueing system with parallel servers 
and synchronization requirements in this paper naturally generalizes the single-server queue, as 
well as its Cramer-Lundberg approximation. 

1.1 Split-merge queues and MapReduce 

As mentioned earlier, our model depicts a queueing network with many parallel servers where 
incoming jobs consist of a random number of pieces, to be processed in parallel by a randomly chosen 
subset of servers, under the constraint that all the pieces must begin their service simultaneously. 
This setup is very closely related to a queueing network known as a split-merge queue mmm 
EH). The main difference between a split-merge queue and the model described in this paper is that 
in the former the synchronization occurs once all the pieces of a job have completed their service. 
More precisely, job fragments are allowed to start their processing as soon as their assigned server 
becomes available, but will continue blocking it, even after having completed their service, until all 
other pieces of the same job have completed theirs. Split-and-merge queues have been used for the 
analysis of “redundant arrays of independent disks (RAID)” in |17l 124] . and constitute a special 
case of a fork-join network where there are no output buffers. Hence, split-merge queues provide 
natural upper bounds for their corresponding fork-join counterparts. One important feature of our 
model compared to split-merge queues, or even the more general fork-join networks, is that much of 
the existing literature assumes that all jobs have the same number of pieces, which is equal to the 
number of servers in the network (usually small). Our model, which is meant to model networks 
with a very large number of parallel servers, allows jobs to have different number of pieces with 
heterogeneous service requirements. 

A popular distributed algorithm that can be modeled using a split-merge queue is MapReduce 
(Hal ), and its open source implementation, Hadoop. The main idea of MapReduce/Hadoop is to 
divide large data sets into smaller units, then process these smaller units on a large number of 
parallel servers and finally assemble the partial answers into the final solution. The initial phase 
of this framework, called the mapping, divides a new job into tasks/files of similar size, e.g., 64 
(or 128) megabytes (MB) in size. Irrespective of the size of the original job, all smaller tasks are 
of equal size - 64 (or 128) MBs, but the number of these tasks and the servers to which they are 
assigned depends on the original job size. After (or in some implementations during) the execution 
of the mapping phase the system begins a shuffling phase, which is then followed by a reducing 
phase, again on a number of parallel servers. The reduce phase merges the partial answers from the 
processed tasks into one final answer. Often, the completion of a job consists of several repetitions 
in sequence of the map-reduce process. 
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To explain the similarities between a split-merge queue and MapReduce it is worth elaborating some 
more on its synchronization and blocking characteristics. In the original FCFS implementation, if 
there are two jobs A and B arriving in that order, all mapping tasks for job A will execute before 
any mapping tasks from job B can begin. As tasks from job A are finishing their mapping phase 
and moving on to the reduce phase, job B mapping tasks are scheduled. This is similar to the 
synchronized server assignment in a split-merge queue. Furthermore, due to the observation that 
reducers need to have all the outputs from the mapping phase in order to perform their work, job 
B reducers cannot start until all job A reducers and all job B mappers have finished. Therefore, 
there is a natural queueing system with blocking of servers. In the Hadoop framework, a job needs 
both mappers and reducers available in order to start processing, and the reducers can only begin 
once all (some) mappers have finished, hence blocking servers for other jobs that arrived at a later 
time. The blocking, or starvation, problem is a known shortcoming of MapReduce (see, e.g., EH)- 

It follows that a MapReduce implementation with FCFS scheduling can be well described by a split- 
merge queue with synchronization whenever the servers remain blocked until all the subtasks are 
rejoined. Section [4] shows numerical evidence supporting the use of our model for approximating a 
split-and-merge queue. The main advantage of using our model for this purpose is its mathematical 
tractability, since as our main results show, the waiting time distribution as well as all its moments 
can be accurately estimated when the number of servers is large. Our model can also incorporate 
more realistic features such as jobs of different sizes and dependent, non-identically distributed 
service requirements for their subtasks. 

1.2 Related Literature 

The model studied in this paper has connections to a vast literature, and therefore we restrict this 
section to only a few closely related models and some recent work on the applications mentioned 
above. 

a.) Distributed queueing models with synchronized service. We start with the model 
considered in m, which studies a queueing system where each job requires a synchronous 
execution on a random number of parallel servers. Some applications mentioned there are: 
the deployment of fire engines in firefighting, jury selection, and the staffing of surgeons and 
medical personnel in emergency surgery. The main difference in m from our setup, besides 
the restriction to i.i.d. exponentially distributed service times, is that we assign the pieces of 
a job to specific servers at the time of arrival, while the model in |T6] waits until the required 
number of servers is free and then assigns the pieces to these servers. We point out that this 
is equivalent to having perfect information about the workloads at each server and routing 
the pieces to the servers with the smallest workloads. Therefore, the model in |16| provides 
a benchmark that can be used to quantify the value of having centralized information. 

Some of the ideas used in the proof of the main result in this paper are borrowed from 
[22], where the authors considered a queueing system with m different types of servers and n 
identical servers of each type (for a total of mxn servers), and where each arriving job requires 
service from exactly one server of each type, i.e., each job needs m parallel servers, and is 
assigned upon arrival to one of the n possible choices for each type. Theorem 2 in [22] shows 
that the steady-state distribution of the waiting time converges weakly, as n —> oo, to the 
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endogenous solution of the high-order Lindley equation (1.1) with N = m.. Besides allowing 
N to be random and the service requirements of the fragments of a job to be dependent, this 
paper shows not only the weak convergence of the steady-state waiting time, but also of all 
its moments. The proof technique used in this paper, which is based on a coupling using the 
Wasserstein distance, is new, and is responsible for the stronger mode of convergence. 


A third related model is the one considered in [[9] , which can be thought of as a stylized 
version of our queueing network were server assignment is not done uniformly at random, 
but rather according to a distribution on specific subsets of servers, e.g., blocks of adjacent 
or closely located servers. The setting there corresponds to all the fragments of a job having 
identical service requirements, but provides interesting insights into the existence of stationary 
distributions for different server assignment rules. 


b. ) Fork-join queues with synchronization. The analysis of distributed computer systems 

with synchronization constraints such as MapReduce/Hadoop, RAID, or even online retail, 
constitutes an active area of research. From a queueing theory perspective, perhaps the most 
widely used model is that of a fork-and-join queue (pT] [25, 32 E2J), which as pointed out 
earlier differs from our model in two ways: 1) the synchronization occurs once all the pieces 
of a job have completed their service, and 2) there may be output buffers after each queue to 
prevent the servers from being blocked. Moreover, fork-join queues are in general difficult to 
analyze, with much of the literature focusing on approximating mean sojourn times of jobs 
([271 DU ESI E3 ESI [Ml EH)- Exact analytical results exist only for a 2-server system with 
i.i.d. exponential service requirements mm), and most of the remaining approximations 
do not scale well as the number of servers grows large. Heavy-traffic approximations can be 
found in [37, 28]. 

More tangentially related, we also mention that a considerable amount of work related to the 
modeling of MapReduce and other distributed computer platforms is done from the scheduling 
perspective. We refer the interested reader to eh sain] and the references therein. 

c. ) Resource allocation problems. Although only briefly, we mention that the model in this 

paper is also related to virtual path allocation problems in communication networks, where 
incoming calls request a specific set of links in the network (virtual path) to establish a 
communication channel, and if any of the requested links is unavailable at that time the call 
is lost (see, e.g., [00,1233)■ Iu a queueing interpretation of this model, one can think of calls as 
jobs, links as servers, and the duration of the call as the common service requirement of the 
job fragments. Existing work in this area has been focused on the blocking-loss model, where 
the main performance measure is the loss probability. Hence, our model can be thought of 
as a variation of this model where queueing is allowed and the number of links is very large. 
More generally, our model can provide valuable insights into a wider class of multi-resource 
allocation problems, such as those appearing in service engineering and consulting project 
management, provided the number of resources is sufficiently large to justify the limiting 
regime. 


The remainder of the paper is organized as follows. Section [2] contains the mathematical description 
of our queueing model; Section [3] describes the analysis of the stationary waiting time of jobs in the 
network, with the main result of this paper in Section 3.1 and the tail asymptotics of the limiting 
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waiting time in Section 3.2 Section [4] contains the numerical experiments mentioned earlier, and 
Section [5] gives some concluding remarks. Finally, the proofs of the two theorems are given in 
Section [6l 


2 Model description 

We consider a sequence of queueing networks indexed by their number of servers, n. Each of the 
n servers are identical and operate in parallel. Arrivals to the nth network occur according to a 
Poisson process with rate An for some parameter A > 0. Each job, upon arrival to the network, 
is split into a random number of pieces, usually proportional to the total service requirement of 
the job. The size of a job, i.e., the number of pieces into which it is split, is determined by some 
distribution f n (k), k = 1 , 2 ,..., m n , where m n is a bound on the number of pieces a job can have 
and is chosen to satisfy m n < n; this condition ensures that each piece can be routed to a different 
server. Once a job has been split, say into k pieces, its fragments are routed randomly to k different 
servers in the network (i.e., with all n\/(n—k)\ possible assignments equally likely), forming a queue 
at their assigned servers. An equivalent way of describing the arrival of jobs into the nth network 
is to use the thinning property of the Poisson process and think of independent Poisson processes, 
each generating jobs of size k, k = 1,2,, m n , at rate f n (k)Xn. 

The service times of the different fragments of a job are assumed to have a general distribution, 
although the stability condition for the model will implicitly impose that they have finite exponential 
moments. Moreover, they are not assumed to be identically distributed and are allowed to be 
dependent, although we do require that they be independent of the number of pieces. More precisely, 
a typical job has N pieces having service requirements (y^ 1 ), ... , x^), where N has distribution 
f n , and is independent of N for each j. Since the random routing eliminates information about 
the order of the pieces, the relevant distribution that will appear in the analysis of the model is 
that of a randomly chosen fragment, which we denote B. The randomness of the fragments’ service 
requirements can be used to include rounding effects and small variations on the type of processing 
that they need. The sizes of jobs and of the service requirements of their fragments are assumed 
to be independent of the arrival process. 

In order to model the synchronization and fairness characteristics of the network, we will assign to 
each job a tag (not to be confused with the label that will be introduced later). More precisely, a 
job having k pieces receives a tag of the form (si, S 2 , ■ ■ ■, Sfc), s, € {1,2,..., ri} for all i, Si ^ Sj for 
i ^ j, representing the different servers to which its fragments are sent for processing. 

Definition 2.1 We say that a job having tag r = (ri,r 2 ,... , 77 ) is a predecessor of a job having 
tag s = (si, S 2 , • • •, Sk) if it arrived before the job having tag s and they have at least one server in 
common (i.e., r^ = Sj for some 1 < i < l and 1 < j < k). We use the term immediate predecessor 
if there are no pieces of other jobs in between the two jobs at the server they have in common. 

In terms of this definition, the synchronization rule is that the job having tag s = (si, S 2 , ■ ■ ■, Sfc) 
cannot begin its service, which is to be done in parallel by servers si, S 2 , ..., Sk, until all its imme¬ 
diate predecessors have completed their service. The fairness rule says that if the job with tag r is 
a predecessor of the job with tag s, then it will begin its service before the job with tag s does. 
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Formally, we can think of the nth system as a superposition of 1 ( n -k )! independent marked 
point processes. Each of these processes generates jobs of size k with server assignments (si,..., Sk), 
according to a Poisson process with rate f n (k)Xn/(n\/(n — k)l), and having i.i.d. marks 




corresponding to the service requirements of the fragments. 


To establish 


stability, we follow the standard queueing theory technique of assuming that at time to < 0 there 
are no jobs in the network, and then look at the waiting time of the first job to arrive after time 
zero. We will refer to this job as the “tagged” job, and we will prove that its waiting time, after 
having taken the limit to —> — oo, is finite almost almost surely. The stability will then follow from 
Loynes’ lemma and Palm theory (see [8| for more details on this general technique). 


To analyze the waiting time of the tagged job we look at a graph containing all the information 
of which jobs need to complete their service before the tagged job can initiate its own. We now 
describe how to construct such graph, called in [22] a predecessor graph. 


2.1 The predecessor graph 

To construct the predecessor graph we look at time in reverse, starting from the time the tagged 
job arrived, say Tj > 0, and ending at time to- The tagged job, which we will label 0, is split into a 
random number of pieces, say Nq = N^n), where Nq is distributed according to f n . Each of these 
pieces will be routed to one of the n servers in the network, where it will either find the server 
empty or join a queue. Suppose that the tagged job needs to be processed by servers (si,..., ), 

and recall from Definition |2. 1 1 that any job that is directly in front of the queue at any of the servers 
Si, 1 < i < Nq is an immediate predecessor of the tagged job. To construct the first set of edges in 
the graph we draw an edge from the tagged job to its immediate predecessors. Moreover, each edge 
is assigned a vector of the form 1 < i < Ny, where f\ is the inter arrival time between the 

tagged job and its zth immediate predecessor, and Xi is the service requirement of the piece of the 
immediate predecessor that is in front of the corresponding piece of the tagged job. Also, if a job 
is an immediate predecessor of more than one fragment of the tagged job, say it requires service at 
servers s* and Sj, then f t = fj , although we may still have Xi 7^ Xj with Xi , Xj possibly dependent. 
Finally, if a piece of the tagged job finds its server empty upon arrival, then there is simply no edge 
to be drawn. Hence, the number of outbound edges of the tagged job is smaller or equal than iVjj. 

Iteratively, once we have identified all the immediate predecessors of the tagged job we repeat 
the process described above with each one of them. We will call the predecessor graph Q n (to), 
since it will depend on both the number of servers n and the time to at which the system starts 
empty. Since G n (t o) will resemble a tree, it will be useful to use tree notation to refer to the 
predecessors of the tagged job. More precisely, let N + = {1, 2, 3,... } be the set of positive integers 
and let U = U^=o(^+) r be the se t of all finite sequences i = (*i, * 2 , • ■., i r ) £ U, where by convention 
= {0} contains the null sequence 0. To ease the exposition, for a sequence i = (zi, Z 2 , ■ ■ ■ ,ik) £ U 
we write i| t = (*i,* 2 , • • ■ ,*t), provided k > t, and i|0 = 0 to denote the index truncation at 
level t, k > 0. To simplify the notation, for i E N+ we simply use i = zi, that is, without 
the parenthesis. Also, for i = (zi,...,z*,) we will use (i,j) = (zi,... ,ik,j) to denote the index 
concatenation operation, if i = 0, then (i, j) = j. 

Now recall that 0 denotes the tagged job and label its immediate predecessors z, with 1 < z < Nq. 
The jobs in the next level of predecessors will have labels of the form (zi, Z 2 ), and in general, any job 




Figure 2: The predecessor graph G n (to). The numbers in each node indicate the size of the job 
(number of pieces). Some nodes have fewer outbound edges than the size of the job, meaning that 
the corresponding piece found its server empty upon arrival. Nodes with multiple inbound edges 
correspond to jobs that are immediate predecessors to more than one job in the graph. Vertical 
lines indicate the time of arrival of each job; service requirements for the pieces of a job can be 
thought of as “edge attributes” and cannot be read from the graph. This graph is consistent with 
Figure [I] by letting the tagged job be the three piece black one. 


in the predecessor graph will have a label of the form i = (i\, 12 , ■ ■ ■, ik), k >1. With this notation, 
N[ denotes the number of pieces that the job with label i in the graph is split into, f(y) will denote 
the inter arrival time between job i and its jth immediate predecessor (a job with label (i, j)), and 
denotes the service requirement of the fragment immediately in front of the queue of the jth 
piece of job i. Note that the tag of a job, which contains the specific server assignments, allows us 
to identify the immediate predecessors of a given job, but it plays no role afterwards. Therefore, 
we will use the labels, not tags, to identify jobs in the predecessor graph. See Figure [2} 

We point out that in case a job is an immediate predecessor to more than one job in the graph 
(or to more than one piece of the same job), the corresponding edges will merge into the common 
predecessor. Moreover, in this case, the common predecessor is assigned more than one label (e.g., 
if a job is an immediate predecessor to both jobs i = (i\,... ,ik) and j = (j i,... ,ji), then such 
job can be identified by two different labels, one of the form (i, s) and another of the form (j,f)). 
Furthermore, the merged paths and the subgraph they define from that point onwards will have 
multiple labels as well. See Figure |3j 

Remark 2.2 We allow multiple labels for jobs that are immediate predecessors to more than one 
job (or more than one piece of the same job), since each path leading to a job represents a distinct 
sequence of fragments, with its own waiting time. 

Remark 2.3 As pointed out earlier, the concept of the predecessor graph was introduced in )22f . 
although in less detail than here, e.g., the labeling of jobs is not rigorous and there is no mention of 
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Figure 3: Multiple labels due to common immediate predecessors. Excerpt from the predecessor 
graph in Figure [2] showing the labeling of the jobs. 

multiple labels. In our case, the new coupling technique that we use, which yields a stronger mode 
of convergence in the main theorem and allows us to take N to be random, requires the more careful 
treatment described above. 


3 Analysis of the steady-state waiting time 


To analyze the waiting time of the tagged job we now derive a high-order Lindley recursion. To 
this end, let W^ n ’ to ^ denote the waiting time of the job having label i in the network with n servers 
and that starts empty at time to- We also define Aq = {0}, and A r = {(*i,. .. ,i r ) E G n (t o)} for 
r > 1, to be the set of labels in the predecessor graph at graph distance r from the tagged job, i.e. , 
labels whose corresponding job is connected to the tagged job by a directed path of length r. For 
i E A^ and r > 1 let 

B{ ,r ”^.l t 1 A]^_ |_ r . j (i, , • • ■ , ik+r) j 1 

be the set of labels at distance r from i, with the simplified notation B\ = £>ip. 

We then have that the tagged job’s waiting time is given by 


W^ to) = max jo, max ( Xi ~n + PF. (n,to) ) j , 


(3.1) 


with the boundary condition that the first job to arrive after time to, and an y other job that arrives 
thereafter and is the first one to use its assigned servers, will have a waiting time of zero (recall 
that the system is empty at time to). 

To analyze (3.1) define X$ = 0 and Xi = Xi ~ A for i = {i\, ■ ■ ■, ik)- Next, let 

k = maxjr E N + : \Bi {r \ > 0}, 

where |j 4| denotes the cardinality of set A. Note that k is a random variable and corresponds to the 
maximum length of any directed path in G n (to). Furthermore, for any i E A K we have IFj *° = 0, 
and therefore, for any i E A-i, 

= max J 0 max X: 

1 1 jet?i 
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Similarly, iterating (3.1) we obtain for i e A K _ 2 , 


wf' n,t o) = max < 0, max fx, + H / . < ' n ' ,t °' ) 

1 1 jeBi V J J 


= max < 0, max X;. max ( X;ii + X; 
1 jeBi.i J jeSi, 2 v Jl J 


In general, after iterating (3.1) k times we obtain 


Wn H,t °) = max { 0, max X.-,..., max ( X:ii + X:i 2 + • 
1 ies 0il jeB 0 , K V Jl Jl 


.• + XJ 


(3.2) 


Having now a recursive equation for the waiting time, we need to identify conditions under which 
this queueing system will be stable, and then describe the stationary distribution of the waiting 
time. The key idea to solve both problems is that the predecessor graph is very close to being a 
tree; more precisely, the only thing that prevents it from being a tree is the occasional arrival of 
a job that is an immediate predecessor to two or more jobs in Q n (to)- It turns out that under the 
scaling we consider in our model (arrival rate equal to An), the probability of this occurring within 
the timeframe needed for the tagged job to start its service is very small (geometrically small). 
Once we show that this is the case, taking the limit as to —> —oo will yield the stability of the 
network. 


To describe the stationary distribution of the waiting time we first observe that, provided the first 
time that two paths in the predecessor graph merge occurs after the tagged job has initiated its 
service, we have that the r’s will be i.i.d. exponential random variables with some rate A* and the 
%’s will be i.i.d. with distribution B (since they will all belong to different jobs). It follows that 
under the same conditions that guarantee the stability of the network we would have that, after 
taking the limit as to ~^ —oo and the number of servers n —> oo, W^ n " ,t ° > would have to converge to 
a solution to the stochastic fixed-point equation 


TTT V 

W = max 


0, max (xi - n + Wi ) 
l<i<N 


(3.3) 


where {Wi} are i.i.d. copies of W, independent of (X, xi, 7i, X 2 ,72,...), with X having distribution 
/ = limn^oo f n , the x’s i.i.d. having distribution B, and the r’s i.i.d. exponential random variables 
with rate A* = lim n ^. 0O A*, all random variables independent of each other. Note also that we have 
replaced the set over which the maximum is computed, i E Hg, with 1 < i < X, since in stationarity 
all the pieces of a job have an immediate predecessor. 


It turns out that (3.3) has multiple solutions m , unlike the standard Lindley equation for X = 1. It 
is the structure of (3.2) that will allow us to identify the correct one. As we will see in the following 
sections, the appropriate solution is the so-called endogenous one, which is also the minimal one in 
the usual stochastic order sense. 


To identify the value of A* recall that in the time reversed setting, for the system with n servers, we 
can think of independent Poisson processes each generating jobs with a tag of the form (si, S 2 , ■ ■ ■, Sk), 
for 1 < k < m n , and s* € {1, 2,..., n} for all i, at rate 

_ f n {k)Xn 
k n\/(n — k)\ 
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Moreover, a piece of a job requiring service at server Sj can have as an immediate predecessor any 
job of any size requiring service at server s*. In particular, there are a total of (^ZJ)fc! possible 
predecessors of size k, and therefore, the inter arrival time between the piece of the job requiring 
service from server Si and its predecessor is exponentially distributed with rate 


k =l 




f n (k)Xn fn-1 


k =1 


0 


k - 1 


A ^kUk). 


k =1 


(3.4) 


It follows that, assuming f n is uniformly integrable, 

oo 

A* =\J2kf(k) = A£[1V]. 

fe=i 


Equation (3.3) is known in the literature (1221 dH |20]) as a high-order Lindley equation, and the 
behavior of its endogenous solution is given in Section |3.2| 


3.1 Main result 


Before we formulate the main result of this paper it is convenient to specify the conditions we need 
to impose on f n , A, and B. Recall that the service requirements of a job of size k , (x^, ■ • • X^)> 
are allowed to be arbitrarily dependent, say having some joint distribution -Bfc(x) on M(j_, but are 
assumed to be independent of the number of pieces. In this notation, B is the distribution of the 
service requirement of a randomly chosen piece, i.e., 

m n k i 

B (x ) = X] 2 *; p (a: w ^ x )- 

k=1 i= 1 

Throughout the paper, =>• denotes convergence in distribution. 


Assumption 3.1 Suppose that f n is a distribution on {1,2,... ,m n }, with m n < n. B is a distri¬ 
bution on M + , and A > 0. 


i) Suppose there exists a distribution f on N + , having finite mean, such that f n / as n —> oo 
and f n is uniformly integrable. 

ii) Suppose there exists /3 > 0 such that 


E 


r n 

e P{xi~n) 

_i =1 



A* 

X* + /3 


E[N]E 




< 1, 


where N is distributed according to f, {x*} are i-i.d. random variables with distribution B, 
independent of N, and {t{\ are i.i.d. exponentially distributed random variables with rate 
X* = XE[N], independent of (N , xi, • • •, Xiv)- 


in) Suppose that 


rn„ 

lim - W k 2 f n {k) = 0. 
n—»oo n L ' 
k =1 
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To give some examples of distributions for which Assumption 3.1 is satisfied, let N be distributed 
according to / and consider 

fn(k) = P(mm{N, m n } = k) or f n (k) = P(N = k\N < m n ). 

In both cases, provided E[N] < oo, we can take any m n —> oo, including m n = n, and have f n 
uniformly integrable, since the monotone convergence theorem gives E[mm{N,m n }} —> E[N ] and 
i£[.ZV|IV < m n ] = E[N1(N < m n )]/P(N < m n ) — > E[N], Finally, Assumption 3.1 (iii) would be 
satisfied in both examples, with m n = n, if E[N 1+e ] < oo for some e > 0, since then 


k =i 


n 

n—>oo ft 


1 — € 


lim — Y k 2 f n (k) < lim - k 1+c f n (k) = E[N 1+e ] lim n 6 = 0. 

n—^oo 77, • ^ n .—yoo 7 i tL—j J n —yoo 


k =1 


In case F![A’ 1+e ] = oo for all e > 0, one would need to take m n = o(n) to obtain 


1 m n n 

lim — Y k 2 f n (k ) < lim — kf n (k ) = E[N] lim 

n—>oo Th ^— J rt —>oo n - n —^-oc 


fc=l 


n—>oc Tl 


k =1 


m n 

— = 0. 

n—»oo n 


We are now ready to formulate the main theorem. 


Theorem 3.2 Let W^ n ’ to> denote the waiting time, excluding service, of the tagged job (the first 
job to arrive after time zero) when we start the system empty at time to < 0 and the network 
consists of n servers. Suppose that 


E[N}E 


oP{x~t) 


< 1 


(3.5) 


for some fi > 0, where N has distribution f n , x has distribution B and f is exponentially distributed 
with rate A* and is independent of %. Then, for any fixed number of servers n, 

lim wj, nM) = W (n) a.s. 

10 ^ — 00 ^ 


for some finite random variable iy( n ). Moreover, provided Assumption 3.1 is satisfied, 

W ( n ) =» W, 


as n —>• oo, where W is the endogenous solution to (|3.3|). Furthermore, for any p > 0, 

E 


(W {n) ) p ^E[W p ]<oo, n —> oo. 


The key idea for the proof of the stability result is to couple the predecessor graph with a weighted 
branching tree ( [29L l!8| ) and show that the waiting time of the tagged job is dominated by the 
maximum of the random walks along all the paths of the tree. The identification of the limit with 


the endogenous solution to the high-order Lindley equation (3.3) will follow from a similar coupling 


argument between the predecessor graph and a weighted branching tree, in which we will show 
that with high enough probability the tagged job will initiate its service before we observe the 
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first merging of paths. This critical timescale at which the first merging of paths is observed also 
explains why the dependence among the different service requirements of a job plays no role, since 
a typical job will only “see” one fragment from each of its predecessors still present in the system 
when it arrives. This dependence does however impact the sojourn time, i.e., the time a job spends 
in the system from the moment it arrives until all its fragments complete service, which in the limit 
is given by 

T = max < 0, max (v-j — r* + WA > + max (3.6) 

( l<i<N J l < j < TV 


where (x^\..., x^) is distributed according to Bk(pc) and is independent of (A, {x%}, {r }, {IL 7 ,;}), 
and the {Wi} are i.i.d. copies of the endogenous solution to (3.3). 


3.2 Analyzing the limit: Generalized Cramer-Lundberg approximation 


As stated in Theorem 3.2, the stationary waiting time in the system with n servers converges 


to the endogenous solution to the stochastic fixed-point equation (3.3), which receives its name 
since it can be explicitly constructed on a weighted branching process. For completeness, we now 
briefly describe the construction of a weighted branching tree, which is more general than the setup 
considered in this paper. 


Let (Q, A, Ci, C 2 ,...) be a vector with A E NU{oo}, and Q , {Ci} real-valued; the interpretation of 
Q and the {Ci} depends on the application. Given a sequence of i.i.d. vectors 
{(Qi, A;, C(i tl ), C(j j2 ), ■ ■ • )} igf/ having the same distribution as the generic branching vector 
(Q, A, Ci, C 2 ,...), we use the random variables {Ai}i g [/ to determine the structure of a tree as 
follows. Let Aq = {0} and 


A r = { (i, i r ) E U : i E A r _i, 1 < i r < N\}, r > 1, 


(3.7) 


be the set of individuals in the rth generation. Next, assign to each node i in the tree a weight 11; 
according to the recursion 

lid 1, c^jiii. 


Each weight Lb is also usually multiplied by its corresponding value Q\ to construct solutions to 
non-homogeneous stochastic fixed-point equations. 


In the general formulation, the vector (Q, A, Ci, C 2 ,...) is allowed to be arbitrarily dependent, 
although for the special case appearing in this paper we will have A < 00 a.s., Q = 1, and the {Ci} 
nonnegative, i.i.d., and independent of A. For more details we refer the reader to [221 USES]- 


To make the connection between the high-order Lindley equation (3.3) and the main result in 
let R = e w , Ri = e Wi , Q = 1, and Ci = e Xi ~ Ti to obtain 


N 


R = QW \/ CiRi 


\i=l 


(3.8) 


where x V y denotes the maximum of x and y. We refer to ( |3.8| ) with a generic branching vector of 
the form (Q, A, Ci, C 2 ,...) with the {Ci} nonnegative, and the {A*} i.i.d. copies of R independent 
of (Q, A, Ci, C 2 , ■ ■ ■), as the branching maximum equation. 
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It is easy to verify, as was done in [20], that the random variable 


R= V V n jQ) 

r=0 je A r 


is a solution to (3.8), known in the literature as the endogenous solution mm)- Moreover, when 
Q>0, the endogenous solution is also the minimal one in the usual stochastic order sense (see m 
and also the survey paper [1] for additional references and a wide variety of max-plus equations). 


Taking logarithms on both sides of (3.8) (with Q = 1), we obtain that the endogenous solution to 


(3.3) is given by 


W= V V Si- 

r=0 jeAr- 


(3.9) 


where = 0, Sj = logllj = Aj j + X^ 2 + ■ ■ ■ + Xj for j / 0, and A, = Xi ~ T i■ Furthermore, it was 
shown in [20] (see Lemma 3.1) that this endogenous solution is finite almost surely provided 


E 


' N 

X 

_i =1 


,0Xi 


< 1 


for some (5 > 0, which we will refer to as the stability condition. Note that with respect to the 
queueing model in this paper, this stability condition implies the usual “load condition”, i.e., arrival 
rate divided by service rate strictly smaller than one, which in this case would be XE[N]E[x] = 
E[x\/E[t] < 1; the two are equivalent when E[N] = 1. 


Remark 3.3 The stability condition guarantees that W, as defined by (3.9), is finite almost surely. 
Moreover, by Theorem 4 in m, the existence of f3 > 0 such that E Xil=i efi Xt < 1 is the corre¬ 
sponding necessary condition (since P (maxi<j<jv(xi — T i) > 0) > 0). However, we do not consider 


in this paper the boundary condition where E 
for all ft 0. 


sr^N 
Xii =1 1 


= 1 for some 0 > 0 but E 


N 


£f=i * 


> 1 


By rewriting W as 


W = max < 0, maxi,-, max (AT, + A;) , max (ATi + X -.\ 2 + A;) 

1 jeAi J jeA 2 v J|i J/ jeA 3 v J|1 J|Z Jy 


the similarities with (3.2) become apparent. To give some additional intuition as to why (3.9) is 


the appropriate solution, it is helpful to recall the N = 1 case, where Lindley’s equation is known 
to have a unique solution whenever E[ Ai] < 0. Moreover, as mentioned in the introduction, this 
solution can be expressed in terms of the supremum of the random walk Sk = X\ + - ■ --t-A^, Sq = 0. 
A standard proof of this relation consists in iterating the recursion 


W n+ i = max{0, X n + W n }, W 0 = 0, 


to obtain 


T> 

Wn+i — max {0, Xn, X n ? X n -\- Xn—i = max S)$. 

0 <k<n 
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It follows by taking the limit as n —> oo on both sides that the stationary waiting time in the FCFS 
GI/GI/1 queue satisfies 

W = maxSfc. 

fc> o 


It is then to be expected that the asymptotic analysis of the waiting time in the single-server 
queue can also be generalized to the branching setting. This is indeed the case, as was recently 
shown in [20}. There, for the endogenous solution to the general branching maximum recursion 

= 1 and the derivative condition 


(3.8), it was shown that under the root condition E JT =1 Cf 


0 <E 


YliLi Cf l°g C* < oo, we have that 


P(R > x) ~ Hx 


-9 


X 


OO, 


(3.10) 


for some constants 6, H > 0 
tion translates into the existence of a root 0 > 0 such that E 


Note that for the high-order Lindley’s equation (3.3), this condi- 


= 1. The power-law 

asymptotics of R are a consequence of the Implicit Renewal Theorem on Trees from USUIS], which 
constitutes a powerful tool for the analysis of many different types of branching recursions, e.g., 
the maximum recursion mm), the linear recursion or smoothing transform (01313 HIM), 
the discounted tree sum (ID), etc. This theorem is in turn a generalization of the Implicit Renewal 
Theorem of m for non-branching recursions, which can be used to analyze the random coefficient 
autoregressive process of order one and the reflected random walk, among others. The name “im¬ 
plicit” refers to the fact that the Renewal Theorem is applied to a random variable R (e.g., the 
solution to a stochastic fixed-point equation) without having knowledge of its distribution, which 
in turn leads to the resulting constant H in the asymptotics to be implicitly defined in terms of R 
itself. 

We conclude this section with the theorem describing the asymptotic behavior of W, the endogenous 


solution to (3.3) 


Theorem 3.4 Let W be given by (3.9), with N distributed according to f, {x,Xi} i-i.d. with 
common distribution B, and {rj} i.i.d. exponentially distributed with rate X*; all random variables 
independent of each other. Suppose that for some 6 > 0, E[N]E[e ex ]X* / (X* + 0) = 1 and 0 < 
E [e 0x x] — E [e 0x ] /(A* + 9) < oo. In addition, assume that for some e > 0, E < oo. 

Then, 


P(W > x) ~ He 


—Gx 


OO, 


where 0 < H < oo is given by 

(A* + 6) 2 E 


H = 


1 V Vf_i e fl (xi- T <+Wi) _ ^ N _ l e 0 (xi-u+iR) 


ex *E[N] (A* + 0)E [e dx x] ~ E [e 0 *] 

with the {Wi} i.i.d. copies of W, independent of (N, xi,ri,..., xn, Tat)- 


The constant H in the asymptotic tail of W can be computed via simulation, for example, by using 
the algorithm recently developed in | T2} . which can be used to generate the {Wi} appearing in the 
expectation. 
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Job size 

Arrival 
rate (An) 

Model 

Mean 

Sojourn Time 

95% C.I. 

E[N] 

P 

2 

2/3 

100 

SyncB 

0.7290 

[0.6843, 0.7737] 




Split-Merge 

0.7389 

[0.6943, 0.7836] 




M/G/n 

0.9994 

[0.9206 , 1.0782] 

10 

6 

6.5 

SyncB 

1.1201 

[1.0654, 1.1748] 




Split-Merge 

1.1116 

[1.0599, 1.1634] 




M/G/n 

4.9916 

[4.6585, 5.3247] 

100 

66 

0.06 

SyncB 

1.0138 

[0.9995 , 1.0281] 




Split-Merge 

1.0140 

[0.9995 , 1.0284] 




M/G/n 

49.7684 

[46.8525, 52.6844] 


Table 1: Mean sojourn time. Simulated results for the average sojourn time of jobs in the SyncB, 
Split-Merge and M/G/n models; in all models the number of servers is n = 1000, the arrival rate of 
jobs is An, indicated by the table, the service requirements of the pieces of a job are i.i.d. U(0,1), 
and the number of pieces is computed as N = N An, with N — 1 a mixed Poisson random variable 
with Pareto(a, (3) rate, a = 3, and /? according to the table. For the M/G/n queue the service 
requirement of a job is the sum of the requirements of its pieces. All three simulations were run 
using the same arrivals and jobs. Simulations were run for a total of 30,000 jobs. 

4 Numerical experiments 

In this section of the paper we provide some numerical experiments comparing our model to two 
other: a non-distributed multiserver system and a split-merge queue. All the results in this section 
were obtained using discrete-event simulation, starting with an empty system. 

Throughout this section, we refer to the model studied in this paper as the synchronize at the 
beginning (SyncB) model. We also consider a split-merge queue (Split-Merge) where incoming jobs 
are split upon arrival into a number of pieces, and then assigned to randomly selected servers, each 
of which operates in a FCFS basis. To explain how the synchronization occurs, assume that a job 
fragment that has completed service does not leave the server until all other pieces of the same 
job are done as well (since there are no output buffers in the system). Unlike in the SyncB model, 
fragments that have reached the front of their queues and find their servers available can begin 
processing immediately. 

Our first set of results compares the SyncB, Split-Merge and M/G/n models, all run with the same 
Poisson arrivals and job distributions. For the M/G/n model the service distribution is that of the 
sum of all the pieces in a job. The purpose of including the M/G/n queue in this comparison is 
to illustrate the gain attained by distributing the work among parallel servers, which as Table [T 
shows, outweighs the loss of efficiency due to the blocking. In all the experiments, we focus on the 
sojourn time of jobs, i.e., the amount of time a job spends in the system, from the time it arrives 
to the time it completes its service and leaves. Table [I] shows simulated values for the expected 
stationary sojourn time in a network with n servers along with 95% approximate confidence intervals 
(the parameters lie within the theoretical stable region for both the SyncB and M/G/n models; 
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Sojourn time tail distribution 


Sojourn time tail distribution 


E[N] =2, /3 = 2/3, An = 100 




l -1-1- I i I- 1 -1 - 

0 1 2 3 4 5 6 7 


X 


X 


Figure 4: Sojourn time tail distribution. Simulated values of the tail distributions for the SyncB 
and Split-Merge models. In both cases, the number of servers is n = 1000, the arrival rate of jobs 
is An, indicated on the plots, the service requirements of the pieces of a job are i.i.d. U(0,1), and 
the number of pieces is computed as N = N An, with N — la mixed Poisson random variable with 
Pareto(a, /3) rate, a = 3, and (3 indicated on the plots. Both simulations were run using the same 
arrivals and jobs for a total of 30,000 jobs. The tail distribution of the limiting sojourn time T is 
provided for comparison. 


we do not have a criterion for the stability of the Split-Merge model but the simulated results 
are consistent with stationarity). As we can see from the table, the two distributed systems are 
comparable, and better than the non-distributed M/G/n queue. We point out that for the M/G/n 
queue it can be verified that under the scaling considered here, the waiting time converges to 
zero, which reduces the sojourn time to essentially the service time of a job, i.e., a quantity of the 
form £jIiX (<) , whereas in the SyncB model the waiting time is non-zero but the service time is 
V^Li In our experiments we have used i.i.d. uniform service times for the job fragments and a 
heavy-tailed mixed Poisson for the number of pieces, in which case has a power-law tail 

whereas X® is bounded, which leads to a considerable number of jobs experiencing very long 
sojourn times in the non-distributed system. 

Our second set of results compare the tail distributions of the sojourn times in the SyncB and Split- 
Merge models. Figure [4] depicts two comparisons, one for E[N] = 2 and one for E[N] = 10. The 
parameters used in both plots are within the stability region for the SyncB model. As can be seen 
from the figure, the distributions of the SyncB and Split-Merge models are undistinguishable, which 
strongly supports the use of the SyncB model for approximating the mathematically intractable 
Split-Merge model. In other words, SyncB seems to provide an accurate model for MapReduce with 
random routing and FCFS scheduling, under light loads. We have also included the tail distribution 
of the limiting sojourn time ( |3.6[ ), computed using the algorithm in [12]. The approximation works 
best for small values of E[N ] and light loads (i.e., XE[N]E[x] small). 
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Mean sojourn time at stability boundary 



Figure 5: Mean sojourn time. Simulated values of the average sojourn time of the first t jobs for 
the SyncB and Split-Merge models. In both cases, the number of servers is n = 1000, the arrival 
rate of jobs is An, with A = 0.2095, the service requirements of the pieces of a job are i.i.d. U(0,1), 
and the number of pieces is computed as N = N A n, with N — 1 a mixed Poisson random variable 
with Pareto(a,/3) rate, a = 3, and (5 = 2/3 (F/[IV] = 2). The parameters are such that the SyncB 
model is at its theoretical boundary of stability. Both simulations were run using the same arrivals 
and jobs for a total of 30,000 jobs. 


The last numerical result in the paper compares the SyncB and Split-Merge models right at the 
theoretical stability boundary of the SyncB model. Figure [5] plots the running average of the 
sojourn times of the first t jobs, for up to 30,000 jobs. The plot is consistent with both models 
being unstable, and the important insight we obtain is that the stability regions for the two models 
seem to be very close, if not the same. This observation provides further support for the use of 
SyncB as a qualitatively good approximation for Split-Merge. 


5 Concluding remarks 

The model presented in this paper captures the complexity of a large network of parallel servers 
with blocking and synchronization constraints. Although not an exact model for any of today’s 
existing distributed computing algorithms, our model provides a good approximation for split-merge 
queues, which can be used to model the popular MapReduce algorithm under random routing and 
FCFS scheduling. More importantly, our model is analytically tractable, and hence provides a 
valuable benchmark for studying other distributed queueing models. In particular, one can think 
of the random routing in our model (i.e., the way in which job pieces are assigned to randomly 
selected servers) as a “blind” scheduling rule, since it requires no information about the availability 
or the workload of any of the servers. On the other extreme, the model in jlTij corresponds to 
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the “optimal” scheduling rule, where job fragments are assigned to the servers with the smallest 
workloads. It follows that the “blind” and “optimal” models can be used in conjunction to provide 
a good cost-benefit analysis of any other scheduling rule (e.g., send to idle servers first, or select 
twice as many servers as needed and choose those with the smallest workloads/number of jobs). 
In other words, our model can be used to price server information. Furthermore, our model yields 
valuable insights about general split-merge queues, which are important since they provide upper 
bounds for various performance measures in today’s popular fork-join networks. In particular, the 
unusual and rather stringent stability condition of our model highlights how critical the scheduling 
discipline is, since the “optimal” model in [16] is stable under the usual load condition (arrival rate 
divided by service rate strictly smaller than one). Hence, the analysis in this paper motivates the 
search for “easily implementable” scheduling rules that can recover the weaker stability condition. 


6 Proofs 


This section contains the proofs of Theorem 3.2 and Theorem 3.4 To ease the exposition we 


separate Theorem 3.2 into two parts, the first one concerning the existence of a stationary waiting 
time for a fixed number of servers and a fixed arrival rate of jobs, the second one establishing the 
limiting distribution of the stationary waiting time as the number of servers and the arrival rate of 
jobs grow to infinity. 


We start by summarizing some of the notation that will be used throughout this section, starting 
with all the random variables involved in the predecessor graph. Let 

Uq = 0, U r = max S), r > 1, 
jeir 

where Sj = X jq + X^ 2 + ■ • • + Xj, X\ = x\ ~ L, and A r is the set of labels (not jobs) in the 
predecessor graph at graph distance r from the tagged job. Note that \B%, r \ < \A r \, since 
refers to the set of labels in G n (to), where there are some jobs/fragments that do not have any 
predecessors, and as to — > — oo all jobs/fragnrents will have have one. We also point out that 
since every time multiple paths in the graph merge (i.e., every time a job that is an immediate 
predecessor to multiple fragments arrives) all the jobs from that point onwards will have multiple 
labels, then some of the {M} will be repeated, and are therefore not independent. Similarly, the 
{t;} correspond to the inter arrival times between jobs in the predecessor graph (the length of the 
edges), and are therefore, in general, neither independent of each other nor of the {IVi}. More 
precisely, the marginal distribution of each of the f; is exponential with rate A*, but conditionally 
on knowing that i shares a predecessor with one or more other jobs, its rate changes and all the 
inter arrival times corresponding to edges that merge into the same job become dependent. Finally, 
the {xi} are identically distributed with marginal distribution B, and their dependance with the 
{f i} and {IVi} is limited to the multiplicity of the labels (i.e., labels referring to edges that lie on 
merged paths have the same service requirements). Nonetheless, service requirements of the form 
Xi and Xj with i / j may also be dependent if they correspond to fragments of the same job. We 
will identify labels belonging to the same job in the predecessor graph through the equivalence 
relation i rv J _ 


The analysis of the predecessor graph will become tractable once we identify suitable approxima¬ 
tions where the merging of paths due to common predecessors does not occur, in other words, where 
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the predecessor graph is truly a tree. Since these approximations will be used several times in the 
proofs it is convenient to define them upfront. 

In general, we will use N to refer to a random variable having distribution f n , N to refer to a 
random variable having distribution /, y to refer to a random variable having distribution B, t 
to denote an exponential random variable with rate A*, and r to denote an exponential random 
variable with rate A*. 

Let {Ni}, denote a sequence of i.i.d. copies of N, {xi} an i.i.d. sequence of copies of y, and {t; };<=;/ 
an i.i.d. sequence of copies of f, all independent of each other. Use the {X;} to define a branching 
process by setting Ao = {0} and A r = {(i, i r ) : i e A r ~ i, 1 < i r < Xi} for r > 1. Next, set 
Xi = Xi — Li, Sj = Xj|j + _Xj| 2 + • • • + X j and define 

JJq = 0, U r = max S), r > 1. 
j6A r 


Similarly, by repeating the construction given above after removing the ~ from all the random 
variables we obtain 

Uq = 0, U r = max S';, r > 1, 
jeA r 

where Sj = Xjp + Xji 2 + ■ ■ ■ + Xj and X\ = Xi ~ T \■ We point out that if f n = f for all n > no, 
then there is no difference between all the ~ random variables and those without it. 


We are now ready to prove the first part of Theorem 3.2 


Proof of Theorem 


3.2 


(Stability). We need to show that provided E[N]E [e^ x r )] < 1 the 


limit linit 0 _ 5 ._ oo exists and is finite a.s. To this end, note that since 


W c 


(n,t o) 


V V 

f=ojeB 0 , ; 


X; 


j|i + Xj1 2 H-t- Xj 


( 6 . 1 ) 


and as to ~^ — oo we have that k -> oo a.s. and B^ r f A r , it follows by monotone convergence that 

OO 

lim IW Mo) = \/ U r = W (n) a.s. 

to—L —OO ^ V 

r =0 

Therefore, it only remains to verify that W^ n ' < oo a.s. 

To establish the finiteness of W^ we note that it suffices to show that 


P(U r > 0 i.o.) = 0. 

This in turn will follow from the Borel-Cantelli Lemma once we show that 

P(U r > 0 )<c r 


( 6 . 2 ) 


for some constant 0 < c < 1. Therefore, we focus on showing (6.2). 

Recall from the observations made at the beginning of this section that the {Li}i g g n (t 0 ) are neither 
i.i.d. nor independent of the {Xi}ieg n (t 0 )- More precisely, recall that for each piece of a job requiring 
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service at server Sj there are possible immediate predecessors of size k (i.e., jobs that also 

require service from server s*). Since the arrival of jobs is assumed to follow a Poisson process, 


this leads in (3.4) to the inter arrival time between a fixed piece of a job and its unique immediate 
predecessor to be exponentially distributed with rate A*. The problem arises when the piece of a 
job has two or more immediate predecessors, in which case the rate for the exponential changes. 

Consider an arrival that is predecessor to two jobs (or two pieces of the same job), j\ and j' 2 , and 
note that there must be two different servers, say sq and s, 2 , that are required by the arriving job 
and that are also assigned to jobs j\ and j' 2 , respectively. There are only ('J-Z^kl possible jobs of 


size k requiring service by servers sq 
arrives is given by 


and Sj 2 , and therefore, the rate at which such a predecessor 


m n 

Ai 2) = VA, 


k=2 


n — 2 
k - 2 


ki¬ 


ln general, a job that is predecessor to jobs ji, , • • •, jr hi the graph arrives at a rate 


k=r 




k =1 


n — 1 
k- 1 


k\ = a: 


As for the lack of independence between the {fi}, note that the inter arrival times between pieces 
of jobs that have a common immediate predecessor are dependent. The sequence {fi} is also 
dependent on the {-/Vi}, since a large number of jobs awaiting for a predecessor to arrive increases 
the probability of an arriving job being predecessor to two or more pieces at a time. Hence, the 
analysis of P(U r > 0) needs some care. 

We start by using Markov’s inequality to obtain 


(u r > o') = P ( max Sj > 0 ) < E 

1 

<crT 

> 
i_ 

< E 

i 

<CrT 

[A 
1_ 

v J / 

j(zA r 


j£A r 


Now rewrite the last expectation as follows 






E 


= £® 

e ps n(j G A r ) 


J eAr 

j£N; 



and notice that 


r— 1 


l(j e A-) = l(jfc+l < Aj| fc ), 

k =0 


and is therefore independent of the {fj|fc}£ =1 - Since all labels along a path correspond to different 
jobs, then the vectors {(Nj\ k , X(j|fc,i.), ■ • •, )>fc=o are Li - d - copies of (N , xi, ■ ■ •, Xtv)> where 

the {xi} are i.i.d. copies of X) independent of N. To eliminate the dependence between this last 
sequence and the inter arrival times note that we can replace the {fj|fc}}. =1 with i.i.d. copies of f, 
independent of {{Nj\ k , X(j\k,i), • • • ? X(j|fc, %))}£=() to obtain 


e^ J l(j G A r ) < s .t. e^ j 1 (j G A r ), 
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where < s .t. denotes the usual stochastic order, and the {.Si} were defined at the beginning of this 
section. It follows that 


P (U r > O) < E 


< E 

1- 

'V? 

QJ 

W 

1_ 


jGA r 


jGA r 


where the last expectation can be computed using standard weighted branching processes arguments 
(see, e.g., [18]) and is given by 


E 


E e/JSj 

_j£A r 


= E 


N 


e P(xi-fi) 


i =1 


(e[N]E 


Mx-t) 


(6.4) 


Setting c = E[N]E [e^ x T ^] < 1 completes the proof. ■ 


For the second part of the main theorem we will prove that 

W (n) W as n -> oo 


for any p > 1, where W p denotes the Wasserstein distance of order p (see, e.g., [39], Chapter 6). 
This is equivalent to convergence in distribution plus convergence of all the moments of order up 
to p (see Theorem 6.8 in CUT). 

To this end, we will consider three different sets of processes that will yield intermediate approxi¬ 
mations between W ' 1 and W. In particular, we will show that if p, n is the probability measure of 
wW, Tfc is the probability measure of Vr=o Ur, that Vr=o Ur, vk of Vr=o Ur, and, finally, 
p is the probability measure of 

OO 

W=\J U r , 

r =0 

then 

fTp(/i n ,/x) < W p (p n , v rn ) + Wp{y rn , z-v n ) T Wp{y rn ,v rn ) + W p (y rn , p) —> 0 (6-5) 

as n —> oo, for some r n —> oo. 


The technical difficulty in the proofs lies in the need to construct explicit couplings of the pairs of 
probability measures involved for which we can show that their difference converges to zero in L p 
norm. We point out that although \\U n ) and Vr=o U r , as well as W and Vr=o U r , are naturally 
defined on the same probability space, all other pairs are not. The proof of the many servers limit 
part of Theorem 3.2 is based on a series of results. 


Lemma 6.1 Suppose Assumption 3.1 (i)-(ii) is satisfied. Then, for any r n —> oo as n —> oo, and 
any p > 1, we have that 


lim E 

n—> oo 


w {n) - y Ur 


r =0 


PI 


= 0 and 


lim E 

n—>oo 


W-\J U r 


r =0 


PI 


= 0. 


In particular, this implies that, as n —)• oo, 

^rfi) t 0 and ^^p(yr n ,fi) ^ 0. 
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Proof. Let (3 > 0 be the one from Assumption 3.1 (ii) and let pp = E[N]E [e^ x r )] < 1. Fix 
0 < e < 1 — pp and note that 


E[N]E 


,P(x-t) 


= E[N]E 


oPX 


K 


Ki+ P 


By Assumption 3.1 (i) we have that .£7[IV] —> E[N], and therefore A* —> A* as n —> oo. Hence, 


lim E[N\E 

n—>oo 




= Pa- 


( 6 . 6 ) 


It follows that for large enough n, 


E[N]E 


»P{x-t) 


< Pp + e < 1. 


Next, note that 

r n 

IT (n) - \J U r 


E 


r =0 


PI 


= E 


= E 


^r=r„+l r =0 


< E 


V (# 

.r=r„+l 


V Ur-yilrj J 

OO 

E E [(^ + ) 


V Ur 

V.r=r„+1 


r=r „+1 

To analyze the last expectation note that by Markov’s inequality, 

,Pl 

i o \ \ / / J o 


E 


(U+) p ] = r P ((u+y > x ) dx = r P ^ dx 


< E 

'ePUr 

POO 

/ e-P xl/P dx = E 

V 



Jo 

j£A r 


4- / vP- x e~ u du, 

P p Jo 


where f n °° u p 1 e U du = E\F P L < oo with £ exponentially distributed with rate one. Letting 
Cp, p = pE^P- 1 ]/^ gives 



< Cp tP E 

1 

<D 

> 
l_ 

< Cp, P E 

1- 

w 

1_ 



j €.A r 


j €.A r 


Moreover, as shown in the proof of the stability part of Theorem |3.2| (see (6.3) and (6.4)), we have 

: (E[N]E 


E 


E 

jGA r 




< E 


Je A r 


■PSj 


J{.X-r) 


It follows that, for sufficiently large n 

OO OO 

\P 


E * 

r=r n +1 


u: 


<cp, P Y, ( E ^ E 

r=r „+1 


D /3(X—f) 


OO 

y<c„, p £ {pd+ o r =o (( P „+«) r ") 


r=r„+l 
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as n —> oo, since pp + e < 1. 

The proof involving W and the {U r } r >o is essentially the same and is therefore omitted. ■ 

The following result regarding the contribution of all the paths with multiple labels in the predeces¬ 
sor graph is the most technical one in the paper, since it is where the subtle dependence introduced 
by the merging of paths plays a role. 

Lemma 6.2 For r > 1 define M r = {i £ A r : i ~ j for some j / i } to be the set of labels in 
the predecessor graph at graph distance r from the tagged job belonging to jobs with multiple labels. 
Then, for any (3 > 0 and pp = E[N]E [e^ x ~ r )], 



Proof. We first write for r > 1, 

E ^ E e^l(i G M r ) . 

i eM r iePfj. 

Now note that along a path all jobs are different and therefore the service requirements {Xi\kYk=i are 
i.i.d. with distribution B, and are independent of the {-^Vi|fc}fc=o- The inter arrival times {Ti\kYk=i 
do depend on the {fVi|fc}[,Zo in the sense that a large number of jobs in the predecessor graph at 
the time a job arrives increases its probability of being an immediate predecessor to more than one 
job, and therefore influences the rate of the corresponding f. It follows that if we replace them by 
i.i.d. copies of f independent of everything else, we obtain 

E e? Sl l (i€M r ) <(e e /3(x_f) J'pfieM,). 

To compute the last probability let C\ denote the event that i is a common immediate predecessor 
to two or more jobs/fragments in the predecessor graph, and note that 

r r 

l(i G M r ) = J2 l((i|s - 1) € i, (i|s) G M s ) < ^ l(i G A r , C i]s ), 

S=1 s= 1 

and therefore, 

r T r— 1 

E p e Af r ) < e E ^ n ife+i < %)i(Ci| S ) 

ieN!j_ 5=1 ieNIj. Lfc=o 

Next, let <Ti = Ti — fj|i — • • • — f\ denote the time at which job i arrived to the predecessor graph. 
Define Ft = cr ((iVj, sj, %j, fj) : oj > t) to be the sigma algebra containing the “history” of the 
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predecessor graph over the interval (t, Ti], and note that F rJi does not reveal whether C\ occurred 
nor the value of N\. Now note that for 1 < s < r — 1, 

~r— 1 

n l(4+i < ^Vj|fc)l(Ci| s ) 


ieNL Lfc=o 


ieNt Lfc=o 


s —1 r— 1 

t. e n 1(4+1 < k )E 1(4+1 < ^i|fc)l(Ci|s) 


Moreover, since {A^i|fc}/(- = i are independent of J4.,, then 

~r— 1 ~| r— 1 

E n 1(4+1 < JVi| fc )l(Ci| s ) T OV[t = E 1(4+1 < N ils )l(C ils ) Jv-, s P(4+i < -^1^), 

-k=s J fc=s+l 

with the convention that \^' =a Xi = 1 if a > b. 

To analyze the last conditional expectation let Kf be the number of pieces of jobs that are available 
at time t, where by available we mean that they they do not have an immediate predecessor in 
(t, Ti]. Note that the event Cj is a function of K a . and Nj only; more precisely, for any j £ we 


It follows that 


Pm . K ) _ < $ 
:1, " ,) ( 4 )(r-\)-"' 


Js+1 < 4j)l(Cj)| = E [it j s+1 < Nj)P(Cj\Nj,K a} )\ F CTj 


<E l(j. s+1 <7Vj) ' T 7j = 1 E[N i l(j s+1 < iVj)]. 

° n J n ° ° 


It follows that, for 1 < s < r — 1, 


ieNt Lfc=o 


i> n l(4+i < N i{k )l(C ils ) 


"s-l 1 Y r_1 

n 1(4+1 < AI fc) -^[lVi| s l(4+l < N;| s )] P(4+l < Ni\ k ) 


ieNt Lfc=o 


E[N 2 ]. 


For s = r note that the same arguments used above give 

"r—1 ~| r— 1 

n l( 4 +i < lVi| fc )l(Ci) = En T( 4+1 < JVi| fc )P(Ci] 

ieN!^. Lfc=0 J ieNIj. k= 0 


e n p ^+i < Efc) 


ieN^. fc=o 


E[N j] = 1 
n n 
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We conclude that 


E 


E«* 


i eM r 


< E 


Mx-t) 


. r— 1 1 

r / 1 


E 

V.s=l 


n 


r ~ 1 ~ r, 1 

E[1V 2 ] + - 
n 


r +1 




Noting that 1 < (E[N]) 2 < E[N 2 ] completes the proof. 


Lemma 6.3 Let X\, X 2 , Yi, Y 2 be nonnegative random variables and let p > 1. Then 

(E [\X\ VI 2 - Li V Y 2 \ p )) l,p < (E [X p ]) 1/p + (E [Y p }) 1/p + (E [|Xi - li| p ]) 1/p . 

Proof. First note that for any real numbers x%, x 2 we have that 

x\ V x 2 = (x 2 - xi) + + Xl, 
from where we obtain that for yi,y 2 also real, 

\xi V x 2 - yi V y 2 \ < (x 2 - xi) + + \xi - 2 / 1 1 + {y 2 - y\) + . 

Moreover, provided x\,y\,x 2 ,y 2 > 0, we have that 

|xi V x 2 - yi V y 2 \ <x 2 + \xi~yi\ + y 2 . 

Substituting in the random variables and using Minkowski’s inequality gives the result. ■ 


The following proposition contains the main coupling between the predecessor graph and its 
weighted branching tree approximation. Its proof relies on the bound provided by Lemma |6.2| 


Proposition 6.4 Suppose that Assumption \3.1\ is satisfied. Then, for v^, the probability measure 
of Vr=o Ur, and Vk, the probability measure of Vr=o Ur, we have that for any r n —> 00, and any 


P> 1- 


Wp(*r n ,*rJ->0 n —>■ 00. 


Proof. From the definition of the Wasserstein metric, we need to construct a coupling of v rn and 
u rn for which we can show that their L p distance converges to zero. We will do this by defining a 
weighted branching tree that will be very close to the predecessor graph restricted to predecessors 
at graph distance at most r n of the tagged job (i.e., whose labels are of the fo rm i = (z 1 ,... ,i r ) 
with r < r n ). To start, define M r = {i E A r : i ~ j for some j -< i} as in Lemma 
the weighed branching tree we proceed inductively starting from the tagged job. 


6.2 


To construct 


Let {IVj}i£f/ be a sequence of i.i.d. copies of N, let {x-}ie (7 be a sequence of i.i.d. copies of x, and let 
{wj }ibe a sequence of i.i.d. exponential random variables with rate A* , all sequences independent 
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of each other and of all other random variables used up to now. Next, let Aq = {0} = Aq and 
A r 0 = A r 0 . which defines A\ = {i : 1 < i < Ag}. In general, for r > 1 and each i G A r , set 


Ni = 


N h if i e M r c , 
A/, otherwise, 


and A; = 


A;, if i € M r c ,, 
X- — f;, otherwise. 


Then, use the newly defined {A r ;} ;e ^ to construct A r+ i = { (i, i r +i) : i G A r , 1 < z r+ i < Ai}. We 
point out that, by construction, C A r nA r . Also, the {(Ai, A ( ;i ),..., are now i.i.d. 

with the same distribution as (A, X \,..., A^), and therefore define a weighted branching tree. 
Recall from the beginning of the section that 


and 


We will show that E 


Sq — 0 , <Si — A;|i + A;12 + • • ■ + Ai, i 7^ 0 , 

Ur= \/ Si. 


v;=0 Ur - v;=0 Ur 


i G-A r 

0 as n —> oo. 


To this end, note that we can split the paths in U r and U r as follows: 

U r = max < \j A;, \/ ,S'j 4 > = max | U { r 11 , U^ 2> j 


idMS i£M r 


and 


= max 




U r = max { \J S U \J 5; + 

ieA r nMf i£A r nM r 

Note that 0 G A r n M£, and therefore Uj 1 ' 1 and are nonnegative without having to add the 
positive parts to the corresponding S\ and S\. Moreover, S\ = 5'i for i G M‘: and therefore, 

that 


Ur 1 '* = Ur l \ It follows from Lemma 


6.3 


E 


\JUr-\JUr 

r =0 r —0 


p- 

= E 

max < 

1 





<\\E 


V up, V { — max { V up, V u? ] 

< r =0 r=0 J 

\ pn \ 1/p / 


V ul 2) 

. \r =0 / 


+ E 


y r=Q r=0 

rn \pi\ Up\ p 


V ul 2) 

_ \r =0 / 


To analyze the last two expectations we follow the same approach used in the proof of Lemma 6.1 
to obtain 


E 


V U? ] 


. \r =0 


— up, P y e 


r=1 


E 

ieM r 


JSi 
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where Cp tP = pE[f p 1 ]//3 P with £ exponentially distributed with rate one, and j3 > 0 is the one 
from Assumption |3. 1| (ii). By Lemma 6.2 we have that 


E 


i£M r 


r 

< - 
n 


E[N 2 } (E[N]E 


= /9(x-r) 


By (6.6) we have that E[N]E [e^ x r l] —>• pp as n —> oo. We conclude that, for any 0 < e < 1 — pp 
and sufficiently large n, 


V ^ (2) 

. \r=0 / 


i n 

( e w e 


r =1 


a P(x-T) 


< 9m e 

n 


JV 2 J]>>(w + e ) r = 0 (-- B 


r =1 


N z 


The proof that E 
ted. We have thus shown that 


r n tt 
r =0 Ul 


( 2 ) 



follows the same steps and is therefore omit- 


V Ur- V Ur 


r =0 


r=0 


p- 

= O 

N 2 


\n 

- 


as n —> oo, which in turn implies that W p (y rn , v r , n ) —> 0 by Assumption |3.1| (iii). ■ 

The following, and last, preliminary result provides a coupling for two weighted branching trees. 
As pointed out earlier, this step is unnecessary if f n = f for all n sufficiently large. 


Proposition 6.5 Suppose that Assumption 3.1 (i)-(ii) is satisfied. Then, for the probability 
measure of Vr=o U r > u k> the probability measure of Vr=o U r , any p > 1 and any r n —> oo such that 


lim 

n—>oo 


we have that 


E[N] - E[N] ( E[N]) rn/p r n = 0, 

W p {y r „i u r n ) y 0 


n —> oo. 


Proof. We start by constructing a coupling of u Tn and u Tn . Let {£i}iet/ and {Cijiet/ be two 
independent sequences of i.i.d. Uniform(0,l) random variables. Let F n {k ) = Ej=i fn(j), E(k) = 

Ej=i f(j) and set 

= - TV log & and i\ = - A log &, 

A n A 

N i = F~ 1 ( Ci) and M = P _1 (Ci), 

where = inf{x E M : g(x) > t} (this is the standard inverse transform construction). 

Now use the {Ni} and the {Aq} to construct two branching trees according to Aq = {0} = Aq and 
A r = { (i, i r ) : i E A r — 1 ,1 < i r < Ni}, A r = {(i, i r ) : i E A r — i, 1 < v < IV;} for r > 1. It remains 
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to construct the sequences of service requirements. For the ~ weighted branching tree we sample 
i.i.d. random variables {xi}iet/ having distribution B, independent of all other random variables, 
and set X{ = Xi ~ A f° r each i e U. Then let X\ = Xi — T- 

To analyze the difference between the corresponding processes U r and U r , we start by defining the 
notion of a miscoupling. We say that there has been a miscoupling at node i £ A r if N\ ^ and 
Ni\k = X\\k for all 1 < k < r. Next, define C r = {i G A r : AW = AW for all 1 < k < r}, which 
corresponds to the set of individuals in both trees that have no miscouplings along their paths. 

Following the same steps used in the proof of Proposition |6.4| and with some abuse of notation, 
split the paths in U r and U r as follows: 


and 


U r = max l V ,S' ; + , \/ 5; + 

. ied r nC r ieWnC^ 


U r = max <| \J 5; + , \J 5 ; + 

ieA r nc r \eA r nCz 


A 

= max 


{up,u®}, 


= max 


{^ (1) , u^) 


By using Lemma 6.3 we obtain 


E 


V Ur - V Ur 

r =0 r=0 


p- 

= E 

max < 

1 





< < E 


+ E 


r =0 r =0 

r n \ PI \ Vp 

V^ (2) 


Vr=0 


+ E 


v r=0 r —0 

' r n \P1\ Vp 

V^ (2) 


V w - v W 

r =0 r =0 


1/p 


,r=0 

P 


(6.7) 

( 6 . 8 ) 


The analysis of the two expectations in (6.7) is very similar to the approach used in the proof of 
Proposition |6.4[ so we will skip many of the intermediate steps. First, we obtain that 


E 

(v^Y 

r n 

<c fe Y, E 

£ 


. \r=0 ) . 

r=1 

_ieA r nC£ 


3.1 


where C/i iP is a finite constant and j3 > 0 is the same one from Assumption 
the expectation on the right hand side let Tik = cx((lVi, • • ■, ^.)) : i € A, 


n 


To compute 
< s < k) for 
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k > 1 and note that for r > 2, 


a r = E 


= E 


= E 


E e0Si 

ieA r nC£ 

JVi iVj 

£ £ £ £ e ^(ij)l(jVi ^ JVi) 

ie^r-inC^j J =1 


V' e^ l E 
iei I .-inC“_ 1 


i£.A r _i 

nC r _i J 

— 

’ M 



^ ' ePXp,j) 

ftr-l 


J =i 






" £ 


- 

+ £ 

V e^-E? 

/ JVi) 

Vi 



_ieA r ._iDC 1 —i 

_j=i 




= pp a r - 1 + if JV1(IV ^ AI) 


£ 


0 P(x~t) 


E 


£ e 

_lG-/4.r — lPlCV* — 1 


/3-Si 


where pp = E[N]E [e^ x T i] and (N,N) = (F n 1 ((),F 1 (0) i n the first of the last three expecta¬ 
tions, with £ Uniform(0,l). Letting £ n = E N1(N / A r ) E [e^ x_7 9] and noting that 

_ _ ih) r ~\ 

_iG-Ar—1 nCr—l _ _l£-/4-r — 1 _ 

gives 

dr £ Pp d r — i T £ n ( y fip) 

Iterating this recursion r — 1 times gives, 

a r < (pp) r l di + (r - l)£ ri (/5 /3 ) r_1 = £ n r(/9^) T ' _1 . 


E 

£ ^ 

< E 

£ ^ 


_lE-A r — 1 f 1 Cr —1 


_i&Ar-l 


Since by (6.6) we have that pp —> pp = Jf[AI]if [e^ x £] as n —> oo, then for 0 < e < 1 — pp and 
sufficiently large, 


n 


V u™ 


_ \r =0 


< £ n W 1 < ^ E r (^ + e ) r_1 = O (£„) 


r=l 


r=l 


/ 2 ) 

as n —>• oo. The proof for the expectation involving the {Ur ; } is symmetric with respect to the 
notation, so we obtain 


E 


V U? ] 


. \r=0 


+ E 


V 


_ \r =0 


= o (£ n ) • 


(6.9) 


It remains to analyze the expectation in (6.8). 
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The key idea to do this is to note that 


U.P = max(Sj + £?j), 

jGO r 


where 



(Till + Ti|2 d I- T j) • 


By using the inequality 


max (xi + yi) — max X{ < max Iwd 
1<2 </c l<i<k l<i<k 


for any sequences of real numbers {xi}i >i and {yi}i> 1 and any A; > 1, we obtain 


r n r n 

V uP - V up 

>1 

< E 

1 

> 

> 
1 _ 

< E 

V w 

o 

II 

o 

II 

s- 

- 

r=0 ieCr 


i(zA rn 


where for the last identity we used the observation that if i £ C r for some r < r n , then there 
is at least one j £ A rn such that (j|r) = i (recall that f n ( 0) = /(0) = 0), and since all the {t-j} 
are nonnegative, |Iq < |£j|. To estimate the last expectation note that since the {rijigt/ are 
independent of the {iVi}i g £/, we have 


E 

V i^r 

< E 

E w 

= E[\A rn \] 

1 1 

A* A* 


_i £Ar n 


_ieAr n 




E\y?„ i, 

where Y rn is an Erlang random variable with parameters (r n , 1). Since E , [| J 4 r . n |] = (E[N]) rn , 

|E[iV] - E[N ]| 


l l 


E[N] - E[N] 

A* A* 


A E[N]E[N] 


and 


m p j = 


r °° x r ' u+p ~ 1 e~ x 


1 0 (?n !)■ 

where T(f) is the gamma function, we have that 

r n r n P 

V UP - V up 


< 


dx = 


A 


T(r n + p) 


r(r„ 


E 


r =0 


r=0 


< \E[N}-E[N]\P {E[N]r T(r n +p) 


A p 


r(r„ 


= o 


E[N]-E[N] ( E[N]) rn r p 


as n —> oo, where in the last step we used that lim^oo T(k)k a /T(k + a) = 1 for any a £ 


Combining (6.9) and (6.10) with (6.7) and (6.8), gives 


E 


V Ur - V Ur 

r =0 r=0 


i/p 


= o 


(sP + |e[1V] - E[N}\ {E[N)Y^ p rPj , 


( 6 . 10 ) 
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where 

E[N] — E[N\ (E[N]) rn / p r n -> 0 
as n -» oo by assumption. To see that £ n —> 0 as well, note that 


PP 

ElN]^ 


£ n = kP{N Y k\N = k)f n (k) < fin max P(N / k\N = k). 


Since fip —> pp < 1, it only remains to show that the maximum on the right hand side converges to 
zero. To see this is the case let Rk = {u G (0,1) : F” 1 (m) = k} and note that for any 1 < k < m n , 

P(N Y k\N = k)= [ {l^" 1 ^) <k- 1) + 1 (F _1 (w) > k + 1)} du 
J R k 

= [ {l(F~\u) < F-\u) - 1) + l(F~\u) > + 1)} du 

J R k 

= [ l(\F-\u)-FY\u)\>l)d U 
J R k 

< [ - F n\u)\du, 

Jo 

where the last integral is the Wasserstein distance of order one {W\) between distributions f n and 
/ (see, e.g., [21] [30]), which converges to zero since f n =>• f and E[N] E[N] (by Assumption 3.1 
(i)), which is equivalent to convergence in W\ (see Theorem 6.8 in |39]). This completes the proof. 


Now that we have all the convergence results for each of the pairs of probability measures involved 


in (6.5), we can give the proof of the many servers limit of Theorem 3.2 


Proof of Theorem 


3.2 


(Many servers limit). Let <p(n) = E[N] — E[N ] , which converges to 


zero as n —> oo since f n is uniformly integrable by Assumption |3.1[ Now let 


r n = 


^ / 2iogE[jV] l lo g^( n )l> if£[iV]>l, 

</?(n) -1 / 2 , if E[N] = 1, 


and note that 


E[N] - E[N] (E[N]) r "/ p r n = \i 

[ V(n) 1/2 , 


T ^pj99(n) 1 / 2 |log< / 9(n)|, if E[N] > 1, 

if E[N] = 1, 


which converges to zero as n —> oo in both cases. 


That lim^oo W p {p n p) = 0 as required is an immediate consequence of (6.5) combined with 


Lemma |6.1[ Proposition 6.4, and Proposition 6.5 


The last proof in the paper is that of Theorem 3.4 
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Proof of Theorem 3.4 We need to verify the conditions of Theorem 3.4 in m- The non¬ 
arithmetic condition is immediate from the observation that the {t*} are exponentially distributed 
and independent of ( N, xi, ■ ■ ■, Xn)- The derivative and root conditions follow from the assumptions 
by noting that 


E 


N 




_i=l 


E[N}\* 
(A * + 9) 2 


A * + 8)E 


e 9x X 


- E 


Jx 


and 


E 


' N 

E' 

_i=l 




= E[N]E e dx \*/(\* + 0). 


To verify condition 1 note that for 0 > 1, Lemma 4.1 in [18] gives (using Ci = 1 for all I), 


E 


f N 

E' 

v.i=l 


,Xi~n 


< 


(e[< 


(P-1)(X—r)l 


) E 

N e 

+ E 

Ct) 

ST 

1’ 

_i 




.i=i 


< E 


Mx~r) 


E[N e ] + 1 (by Jensen’s inequality), 


where p = [0]. For 0 < 9 < 1, the same arguments give for any 0 < e < 1, 


" / N \ 

1+e" 


) i+£ E [N 1+e ] + E 

' N 

^Vte-n)/( 1+6) 


< (F[e 0(x " T)/(1+e) ] 


\i=l ) 



_ 1=1 


< E 


J(x~r) 


E[N 1+e ] + 1. 


Since E[N dvt - 1+ ^] < oo by assumption, all the conditions of the theorem are satisfied. 
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